Automated Exploratory Data Analysis
Here the main aim is understand more about the data. I will explore the data using the following steps.
Using automated Exploratory Data Analysis (EDA) tools in Python can be beneficial and time-saving, especially when dealing with large datasets or during the initial stages of a data science project. Automated EDA tools can help data scientists and analysts quickly gain insights into the data, understand its structure, detect patterns, and identify potential issues. Here are some advantages of using automated EDA tools:
Time Efficiency: Automated EDA tools can perform a wide range of data analysis tasks quickly, providing summary statistics, data visualizations, and data profiling with just a few lines of code. This saves time compared to manually writing code for individual analyses. Data Understanding: Automated EDA tools generate various visualizations and summary statistics, allowing data scientists to quickly grasp the characteristics of the dataset, including data types, distributions, missing values, and correlations. Data Cleaning and Preprocessing: Automated EDA tools can identify missing values, outliers, and data inconsistencies, which helps data scientists in the data cleaning and preprocessing stages. Quick Insights: EDA tools can automatically generate visualizations and statistical summaries, enabling data scientists to quickly identify interesting patterns or trends that may require further investigation. Interactive Exploration: Some automated EDA tools offer interactive visualizations, allowing data scientists to explore the data dynamically and customize visualizations based on specific needs. However, there are some considerations when using automated EDA tools:
Limited Customization: Automated EDA tools may not offer the same level of customization as manually written code. While they can provide quick insights, complex or domain-specific analyses might require custom code. Domain Understanding: Automated EDA tools can help with initial data exploration, but they might not fully replace the need for domain expertise and a deeper understanding of the data. Data Privacy and Security: When using third-party EDA tools or libraries, be mindful of data privacy and security concerns. Make sure the data does not contain sensitive information before using automated EDA tools. Complement Manual Analysis: Automated EDA tools should be used as a complementary tool alongside traditional EDA techniques and domain-specific analyses. They can be used to quickly gain an overview of the data before diving into more detailed explorations. Overall, automated EDA tools can be valuable for exploratory analysis and gaining initial insights into the data. They are particularly helpful for quickly understanding large datasets and can be used as a starting point for more in-depth analyses. However, they should be used judiciously, and data scientists should continue to apply their expertise and domain knowledge throughout the data exploration process.
</B>
(1) Importing the Data
</FONT>
# Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
# Displaying all the columns of the Dataframe
pd.pandas.set_option('display.max_columns', None)
# Displaying all the rows of the Dataframe
# pd.pandas.set_option('display.max_rows', None)
dataset = pd.read_csv("/Users/lokaraju/JOBS/Loan Project /LoanDataset.csv")
dtale is a Python library that provides an interactive web interface for data analysis and exploration. It allows you to easily visualize and explore data from pandas DataFrames, making it convenient for both data scientists and analysts. The dtale library generates an interactive dashboard that lets you interact with your data in real-time, visualize statistics, create plots, filter data, and more.
D-Tale's interactive dashboard provides a convenient way to quickly gain insights into your data without the need for writing additional code. It can be a valuable tool for data exploration and analysis during the initial stages of a data science project or when working with new datasets.
#!pip -q install dtale
import dtale
dtale.show(dataset)
/Users/lokaraju/opt/anaconda3/lib/python3.9/site-packages/dtale/views.py:778: FutureWarning: ['loan_limit', 'approv_in_adv', 'loan_purpose', 'Neg_ammortization', 'age', 'submission_of_application'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning.
#!pip -q install pandas_visual_analysis
import pandas_visual_analysis
from pandas_visual_analysis import VisualAnalysis
df=dataset.copy()
VisualAnalysis(df)
VBox(children=(ToggleButtons(_dom_classes=('layout-f4c839e077474ec4befecb998d36486f',), description='Selection…
#!pip -q install pandas-profiling
from pandas_profiling import ProfileReport
profile=ProfileReport(df, explorative=True)
#profile.to_file('Output.html')
profile
2023-07-25 19:34:16,494 - INFO - Pandas backend NOT loaded 2023-07-25 19:34:16,496 - INFO - Numpy backend NOT loaded 2023-07-25 19:34:16,497 - INFO - Pyspark backend NOT loaded 2023-07-25 19:34:16,498 - INFO - Python backend loaded /var/folders/4r/kj7253lj35b9f_lwlpbz4xh80000gn/T/ipykernel_52362/1157611280.py:1: DeprecationWarning: `import pandas_profiling` is going to be deprecated by April 1st. Please use `import ydata_profiling` instead.
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]
#!pip -q install sweetviz
import sweetviz as sv
report = sv.analyze(df)
| | [ 0%] 00:00 -> (? left)
report.show_notebook()